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SI 2 (LOGWORTH? OR LOG()WORTH) AND (OLAP OR ANALYTICAL () PROCESS? 

OR SPLITTING (3N) DATA) 
File 34 0: CLAIMS (R) /US Patent 1950-04/Dec 14 

(c) 2004 IFI/CLAIMS{R) 
File 654 : US Pat. Full. 197 6-2004 /Dec 14 

(c) Format only 2004 The Dialog Corp 



1/3, K/l (Item 1 from file: 340) 

DIALOG (R) File 340 : CLAIMS (R) /US Patent 
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1015594 0 2002-0099581 
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Inventors: Chapman Tonya Kelsey (US); Chu Chengwen Robert (US); Tideman 

Susan Christine (US) 
Assignee: Unassigned Or Assigned To Individual 
Assignee Code: 68000 

Publication Application 

Number Kind Date Number Date 
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Priority Applic: US 2001766789 20010122 

Abstract: ...decision tree processing module determines a subset of the 
dimension variables to split the input data . The splitting of the 
dimension variables predicts the target variable. A multi-dimension viewer 
generates a report . . . 

Exemplary Claim: . . .module connected to the data store that determines a 
subset of the dimension variables for splitting the input data , 
wherein the splitting by the dimension variable subset predicts the 
target variable; and a multi-dimension viewer that... 

Non-exemplary Claims: ...6. The apparatus of claim 5 wherein the statistic 
measure is a logworth statistic measure ... variables and at least one 
target variable; determining a subset of the dimension variables for 
splitting the input data , wherein the splitting using the dimension 
variable subset predicts the target variable; and generating a report 
using the . . . 

...39. The method of claim 38 wherein the statistic measure is a logworth 
statistic measure ... stored input data; after receiving the request, 
determining a subset of the dimension variables for splitting the 
input data , wherein the splitting using the dimension variable 
subset predicts the target variable; displaying the determined dimension 
variables subset. . . 



1/3, K/2 (Item 1 from file: 654) 

DIALOG (R) File 654: US Pat. Full. 

(c) Format only 2004 The Dialog Corp. All rts. reserv. 

0005044888 * * IMAGE Available 
Derwent Accession: 2002-656219 
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Inventor: Chengwen Chu, INV 

Susan Tideman, INV 

Tonya Chapman, INV 
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Publication Application Filing 

Number Kind Date Number Date 



Main Patent US 20020099581 Al 20020725 US 2001766789 20010122 



Fulltext Word Count: 15497 
Abstract : 

...decision tree processing module determines a subset of the dimension 
variables to split the input data . The splitting of the dimension 
variables predicts the target variable. A multi-dimension viewer 
generates a report... 



Summary of the Invention: 

...of transactional data that are generally stored in a data warehouse 
or an On-Line Analytical Processing (OPAL) system. This transactional 
data contains information on the outcomes of enterprise operations. For 
example . . . 

Description of the Drawings: 

...FIG. 3 is a graphical user interface that depicts the recommended 
dimensions and their associated logworths , as displayed before the run 
...FIG. 11 is a graphical user interface that depicts the recommended 
dimensions and their associated logworths , as displayed after the run 



Description of the Invention: 

...preselected target (s). The results are displayed to the user 56 
through an On-Line Analytical Processing (OPAL) viewer 54... format. 
An OPAL viewer 54 displays the OPAL cubes 50 to the user 56. An OLAP 
cubes index 52 is provided in the model repository 40 so that the user 56 
may more easily determine which dimensions and other data are used within 
an OLAP cube ... details of the decision tree algorithm to view 
automatically the determined data groupings with the OLAP viewer 54 . 
The marketing analyst is now able to examine data that may contain 
hundreds ... FIG . 3 is a graphical user interface depicting the 
recommended variables 82 and their associated logworths 84 in window 
80. The logworth of a recommended variable is a measure of the strength 
of the corresponding rule generated by the decision tree processing 
module. For a categorical target variable, logworth is defined as 
logworth =-log (p-value from the chi-square test). For an interval target 
variable, logworth is defined as logworth =-log (p-value from the F 
test) . "Good" splitting variables are ones that have large values of 
logworth . By viewing window 80, the user can determine if each 
recommended variable's logworth has enough strength desired by the 
enterprise to be used as a dimension variable for... 

...Number of Recommended Dimensions" field in FIG. 2) if each recommended 
variable had a significant logworth . If the user selects a recommended 
variable and activates the "Modify Rule" button 86, then. . . 

...method for a particular recommended variable, as shown in FIG. 4 and 
FIG. 5, the logworth for that variable is updated accordingly... 
0050] FIG. 7 is a graphical user interface of the OLAP tables window 
120. The present invention creates a OLAP cubes data store and the user 
may choose to create one or more sub-tables from the OLAP cubes data 
store of the following types ... mailing that did not result in a 
purchase) . The variables that represent the user-selected OLAP 
dimensions are "Statecode" 132 and "Gender" 134. The top horizontal 
header 132 identifies the code ... 0067 ] FIG . 11 is a graphical user 
interface of the recommended variables and their associated logworths . 
This window 170 is identical to the one in FIG. 3 and is accessed by... 
FIG. 14 is one of the graphical user interfaces 200 of the data in the 
OLAP cube data store, displayed as a multidimensional bar chart. The 
recommended dimension variable with the largest logworth is 
automatically selected as the horizontal axis variable. The recommended 
variable with the second largest logworth is automatically selected as 
the forward axis variable. The user may change the horizontal and... does 
make such a selection, then block 226 creates and displays a table 
containing the logworth of each recommended variable (as displayed in 
FIG. 3) . The table includes each recommended variable 82 along with that 
recommended variable 1 s associated logworth 84 in addition to the name 
of the corresponding input variable ... overridden, user-defined, split 
value. Block 234 uses the new split value to update the logworth of the 
corresponding recommended variable (as shown in FIG. 3) by using the 
method... a table (FIG. 11) containing each recommended variable along 
with that recommended variable 1 s associated logworth in addition to the 
corresponding input variable name. Block 266 obtains the logworth for 
each recommended variable from the... model directly from the model 
repository, the results from the model are displayed by an OLAP viewer 
...For example in an alternate embodiment, the users declare variables 
that are traditionally used as OLAP dimensions (within their input data 



set) as OLAP dimension variables. These variables are marked to notify 
the analyst of their role as OLAP dimension variables. The analyst can 
choose to use these traditional OLAP dimensions in the MDDB, along 
with the recommended variables from the competing initial split method. . 

.model repository model (provided the same variables exist in the data 
they are analyzing) as OLAP dimension variables in their current model 
This is helpful,' for example, if from month-to... . 

.recommended variables for that model, the user could select last month' 
recommended variables as OLAP dimension variables for their input data 
set for the current month's model. The user... found by the decision tree 
algorithm. The present invention measures this relation based upon the 
logworth of each split (which can be user-defined .or decision tree 
algorithm defined) , or based. . . 

Exemplary or Independent Claim (s) : 

. . .module connected to the data store that determines a subset of the 
dimension variables for splitting the input data , wherein the 
splitting by the dimension variable subset predicts the target 
variable; and a multi-dimension viewer that... 

.variables and at least one target variable; determining a subset of the 
dimension variables for splitting the input data , wherein the 
splitting using the dimension variable subset predicts the target 
variable; and generating a report using the... stored input data; 
after receiving the request, determining a subset of the dimension 
variables for splitting the input data , wherein the splitting 
using the dimension variable subset predicts the target variable; 
displaying the determined dimension variables subset... 

Non-exemplary or Dependent Claim (s) : 

...6. The apparatus of claim 5 wherein the statistic measure is a 

logworth statistic measure ... 39 . The method of claim 38 wherein 
the statistic measure is a logworth statistic measure... 
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52 26 RD (unique items) 
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File 20: Dialog Global Reporter 1997-2004 /Dec 16 
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File 34 0: CLAIMS ( R) /US Patent 1950-04/Dec 14 

(c) 2004 I FI /CLAIMS ( R) 
File 416: DIALOG COMPANY NAME FINDER (TM) 2004/Nov 

(c) 2004 Dialog Info.Svcs. 
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(c) 2004 Thomson Financial Networks 
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File 638 : Newsday/New York Newsday 1987-2004 /Dec 14 

(c) 2004 Newsday Inc. 
File 654:US Pat. Full. 197 6-2004 /Dec 14 
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File 704 : (Portland) The Oregonian 198 9-2004 /Dec 13 

(c) 2004 The Oregonian ^ 
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(c) 1996 Star Tribune 
File 727:Canadian Newspapers 1990-2004 /Dec 16 

(c) 2004 Southam Inc. 
File 757:Mirror Publications/Independent Newspapers 2000-2004 /Dec 14 
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File 781:ProQuest Newsstand 1 998-2004 /Dec 16 -i 
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File 990:NewsRoom Current Sep 1 -2004/Dec 16 

(c) 2004 The Dialog Corporation 
File 991:NewsRoom 2004 Jan 1-2004/Aug 31 

(c) 2004 The Dialog Corporation 
File 993:NewsRoom 2002 ' 

(c) 2004 The Dialog Corporation 
File 995:NewsRoom 2000 

(c) 2004 The Dialog Corporation 



3/3, K/l (Item 1 from file: 20) 

DIALOG (R) File 20: Dialog Global Reporter 
(c) 2004 The Dialog Corp. All rts. reserv. 

04349539 (USE FORMAT 7 OR 9 FOR FULLTEXT) 
INDONESIAN NEWSPAPER HIGHLIGHTS - FEB 16, 1999 

ASIA PULSE 
February 16, 1999 

JOURNAL CODE: WAPL LANGUAGE: English RECORD TYPE: FULLTEXT 
WORD COUNT: 291 

(USE FORMAT 7 OR 9 FOR FULLTEXT) 

... their financial problems. 

At least 11 forest concessioners has exported 114,000 cubic meters 
of log worth US$13.2 mln since Jan 1. 
(ANTARA) 



3/3, K/2 (Item 2 from file: 20) 

DIALOG (R) File 20: Dialog Global Reporter 
(c) 2004 The Dialog Corp. All rts. reserv. 

03274353 (USE FORMAT 7 OR 9 FOR FULLTEXT) 
INDONESIA'S FORESTRY EXPORT EARNINGS RISE 

ASIA PULSE 
October 29, 1998 

JOURNAL CODE: WAPL LANGUAGE: English RECORD TYPE: FULLTEXT 
WORD COUNT: 168 

(USE FORMAT 7 OR 9 FOR FULLTEXT) 

... 762 million worth rattan furniture, $US553.607 million from other 

wood-based products, timber and log worth $US1.182 billion, wood panel 
worth $US2.682 million, and pulp and paper worth $US3... 



3/3, K/3 (Item 1 from file: 416) 

DIALOG (R) File 416: DIALOG COMPANY NAME FINDER (TM) 
(c) 2004 Dialog Info.Svcs. All rts. reserv. 

170662411 

LOGWORTH LIMITED (CO=) 

DIALOG FILE 561: ICC BRITISH CO.DIR 

(C) 2004 ICC ONLINE INFORMATION GROUP 
RECORDS AS OF 08/18/04: 1 
TYPE OF DATA: Directory 



3/3, K/4 (Item 1 from file: 545) 

DIALOG (R) File 54 5 : Investext (R) 

(c) 2004 Thomson- Financial Networks . All rts. reserv. 
06755278 

Spectra -Physics - Company Report 

ENSKILDA SECURITIES 
Eriksson, A.H. 
UNITED KINGDOM 

DATE: December 17, 96 

INVESTEXT { tm) REPORT NUMBER: 1821957, PAGE 6 OF 13, TEXT/TABLE PAGE 
This is a(n) COMPANY report. 

TEXT: 

. . . LOG 

is being distributed to Spectra- Physics 1 shareholders in order to 
eliminate any tax effects. 



LOG worth SEK 80 per share 



3/3, K/5 (Item 1 from file: 561) 

DIALOG (R) File 561: ICC British Co.Dir 

(c) 2004 ICC Online Information Group. All rts. reserv. 

08403875 ( FOR FULL FORMAT, USE FORMAT 9) 
LOGWORTH LIMITED 

BAKER TILLY CHARTERED ACCOUNTANT 
BRAZENNOSE HOUSE 
LINCOLN SQUARE 
MANCHESTER M2 5BL 
COUNTRY: . ENGLAND & WALES 

REGISTERED COMPANY NUMBER: 02049180 
ACCOUNT TYPE: FULL ACCOUNTS 

COMPANY TYPE: Private limited with share capital 
This is a DISSOLVED company 



3/3,K/6 (Item 1 from file: 638) 

DIALOG (R) File 638 : Newsday/New York Newsday 
(c) 2004 Newsday Inc. All rts. reserv. 

09531053 

BASKETBALL SUMMARIES (stand alone chart) 

Newsday (ND) - Saturday January 31, 1998 
Edition: NASSAU Section: SPORTS Page: A33 
Word Count : 4, 115 

...4-0-8, Becker 5-0-13, Martens 5-9-20, Salvage 4-1-11, Logworth 2-0-4. 
Totals: 22-10-60. Three-point goals: J 4 (Kwiat 2, Golub. . . 



3/3, K/7 (Item 1 from file: 704) 

DIALOG (R) File 704 :( Port land) The Oregonian 
(c) 2004 The Oregonian. All rts. reserv. 

05851029 ' 

THE WORLD'S MAJOR TIMBER TRADERS ARE 

OREGONIAN (PO) - MONDAY December 17, 1990 
By: RICHARD READ - of the Oregonian Staff 
Edition: FOURTH Section: LOCAL STORIES Page: A01 
Word Count : 2,4 24 

He hopped aboard a 500-year-old Sitka spruce log worth about $3,000 
less than its purchase price earlier this year, equating its bleached 
surface . . . 



3/3, K/8 (Item 2 from file: 704) 

DIALOG (R) File 704 :( Portland) The Oregonian 
(c) 2004 The Oregonian. All rts. reserv. 

05731069 

BRAZIL'S UNDERWATER LOGGER HARVESTS WOOD FROM HUGE LAKE 

OREGONIAN (PO) - SUNDAY August 19, 1990 
By: JAMES BROOKE - New York Times News Service 
Edition: SECOND Section: WIRE STORIES Page: A10 
Word Count: 707 

...olive in a jar," Gomes said as a barge derrick hoisted a dripping 2-ton 
log . Worth $400 uncut, the log is of anjelywood, a tropical hardwood 
used for furniture. 



The derrick. 



3/3, K/9 (Item 1 from file: 724) 

DIALOG (R) File 724 : (Minneapolis ) Star Tribune 
(c) 1996 Star Tribune. All rts. reserv. 

05727091 

UNDERWATER LOGGERS HARVEST AMAZON'S LOST TREASURE 

STAR TRIBUNE (MS) - Tuesday, August 14, 1990 
By: James Brooke, New York Times 
Edition: METRO Section: NEWS Page: 04A 
Word Count: 486 

. . . olive in a jar, " Gomes said as a barge derrick hoisted a dripping 
two-ton log worth $400 uncut. Gomes estimates it will take 15 years to 
harvest Tucurui ' s submerged wood... 



3/3,K/10 (Item 1 from file: 727) 

DIALOG (R) File 727: Canadian Newspapers 
(c) 2004 Southam Inc. All rts. reserv. 

00342877 ( USE FORMAT 7 FOR FULLTEXT) 

Logging the depths: Backwoods inventor inspires new growth industry for 
lands flooded by Amazon hydro projects 

JAMES BROOKE 

Vancouver Sun, 3* ED, P B7 
August 18, 1990 

DOCUMENT TYPE: STORY; NEWSPAPER LANGUAGE: ENGLISH RECORD TYPE : 
FULLTEXT 

Word Count: 687 

...olive in a jar," Gomes said as a 
barge derrick hoisted a dripping two-ton log . Worth $400 uncut, 
the log is of anjelywood, a tropical hardwood used for furniture. 

The derrick. . . 



3/3,K/ll (Item 2 from file: 727) 

DIALOG (R) File 727: Canadian Newspapers 
(c) 2004 Southam Inc. All rts. reserv. 

00088869 (USE FORMAT 7 FOR FULLTEXT) 

There 1 s a logging boom along the Amazon - underwater 

JAMES BROOKE 

Gazette (Montreal), Final ED, P L2 
August 25, 1990 

DOCUMENT TYPE: NEWSPAPER LANGUAGE: ENGLISH RECORD TYPE: FULLTEXT 
SECTION HEADING: Comics & Hobbies 
Word Count: 921 

...olive in a jar," Gomes said as a 
barge derrick hoisted a dripping two-ton log . Worth $400 uncut, 
the log was of anjelywood, a tropical hardwood used for furniture. 

The derrick. . . 



3/3,K/12 (Item 1 from file: 781) 

DIALOG (R) File 781 : ProQues't Newsstand 
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EXPLORING DECISION TREES IN 
SAS ENTERPRISE MINER 4.1 

By Dr. Nick Evangelopoulos 
Based on material from SAS Education 



The SAS* System 
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WELCOME to this SAS Enterprise Miner 4.1 tutorial. Parts of this write-up are based on 
SAS Education material. This handout introduces you to assignment PR3. 

STARTING A NEW ANALYSIS 

The analysis described in this handout, although related to them, does not build on the 
previous analyses (PR1 and PR2), but rather starts an exploration of decision trees from 
the beginning. As a start, we will build a SAS data library called data4520. 

BUILDING A PREDICTIVE DECISION TREE 

• Add to your Diagram Workspace an Input Data Source Node pointing at 
PVA_RAW_DATA, located in the data4520 library. Make sure all the 
variable model roles are set correctly and make TARGET_B your target 
variable with TARGET_B=1 the target event. Then add a default Tree node. 





DATA4520 . Tree 
PVA_RAU_ 
DATA 

• Run the Tree and view the results. The results window contains a Summary 
Table on its upper-left part (Screen #1). View the tree, go to Tree Options, 
change the tree depth field to 6 and observe some terminal nodes (leaves) where 
everybody (100%) donated. Follow the path from the root to a particular leaf 
with 5 persons, all of which donated. Compose the decision rule that corresponds 
to this leaf. This rule tells you a success story that is true for the 5 persons, but 
perhaps not generalizable to the entire population. 



TUNING THE DECISION TREE 

EH • Add a Data Partition node and choose to use 50% of the data for Training 
&l and 50% for Validation. 
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DA TA4 520. Da ta Tree 

PVA_RAU_ Partition 
DATA 



• Run the Decision Tree again and view the results. The Assessment Plot (Screen 
#2) reveals that the accuracy on the validation data is uniformly higher than that 
of the training data, which is counterintuitive. 

• Open the Data Partition node and change the sampling method to Stratified. The 
Stratification tab becomes ungrayed. Select the Stratification tab and set the 
status for TARGET_B to use. The training and validation sets now contain a 
similar proportion of donors. Close the Data Partition node and save the changes. 

• Grow the tree again and verify that this time the Assessment Plot exhibits typical 
behavior. As model complexity increases, performance improves on both training 
and validation data and then diverges (Screen #3). 



SPECIFYING POPULATION PRIORS 

• Open the Input Data Source node. Select the Variables tab. Click on the 
Model Role of TARGETJB and select Edit Target Profile. If no target 
profile is found, say "yes" to create one. Select the Prior tab. Add a new 
Prior Vector and set it to use. Set the prior probabilities so that they reflect a 5% 
donor proportion. Close the target profile and save the settings. Close the Data 
Input Source node and save the settings. 
• Run the Tree again and view the results. The assessment Plot now (Screen #4) 
shows that a final tree with only one node was selected! 




DEFINING A PROFIT MATRIX AND GROWING A PROFITABLE TREE 

• Open the Input Data Source node. Select the Variables tab. Click on the 
Model Role of TARGET_B and select Edit Target Profile. Select the 
Assessment Information tab. Add a Profit Matrix and set it to use. Set up 
the profit matrix by typing 15.05 in the upper-left cell, -0.68 in the lower left, and 
0 in the two others. Close the Target Profiles window and save the settings. 
Close the Data Input Source window and save the changes. 
• Run the Tree and view the results. View the tree. The nodes show you what is 
the decision (prediction), and the associated profit in the training and the 
validation set. 




CONSOLIDATING CATEGORICAL INPUTS 

We will now demonstrate how to use a tree model to group categorical input 
levels and create useful inputs for regression and neural network models. 

• Add Replacement node and a Regression node. Open the Regression 
node. Select the Selection Method tab and change the method to stepwise. Run 
the Regression node. 

Connect a Tree to the Replacement node and label it Consolidation Tree. Open 
the Consolidation Tree node and change the status of all inputs to "don't use". 
Set the status of CLUSTER_CODE to "use". The categorical input variable 
CLUSTER_CODE has more than 50 distinct levels. With so many distinct levels, 
its usefulness as an input in a regression or neural network model is limited. We 
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will use a tree model to group these levels based on their association with the 
TARGETJB and create a new model input. This input can be used instead of 
CLUSTER_CODE in a regression or neural network model. 
• Run the Consolidation Tree and view the results. 





Conso 1 idat ion 
Tree 



Disappointingly, the tree algorithm found no significant splits. The primary reason for 
this is the Kass adjustment to logworth discussed in the previous section. The adjustment 
penalizes the logworth of potential CLUSTER_CODE splits by an amount equal to the 
log of the number of partitions of CLUSTER_CODE levels into two groups, or loglO(2L- 
1-1). With 54 distinct levels, the penalty is quite large. It is also quite unnecessary. 
The penalty avoids favoring inputs with many possible splits. Here we are building a tree 
with only one input. It is impossible to favor this input over others because there are no 
other inputs. 
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Open the Consolidation Tree and select the Advanced tab. Deselect the Kaas p- 
value adjustment. Run the Consolidation Tree again and view the results. 
Manually select a tree with 2 leaves and save the changes. 



To use the grouped values of CLUSTER_CODE in a subsequent model, we must add the 
predicted values to the training and validation data. 



• Close the Results window and once more open the Consolidation Tree node. 
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• Select the Score tab and then select Process or Score: Training, Validation, and 
Test . 

• Select the Variables subtab. 

• Deselect all checkboxes except Leaf identification variable . 



Tree: Model Untitled 



prr 



t ' Da ta "pVor lab l es j^B aslc | nxfvanced Score j Notes_ 



Qjlnput variable selection 
— New variables related to score 
OOurooy variables 
PXeaf Identification variable 
P Predict ion variables 



Data si Variables 



Close the Tree Model window and save the changes. 

Run the Consolidation Tree node. You need not view the results. 

The Tree node adds a variable called _NODE_ to the training data. To use this 

variable in a subsequent analysis, you must change its Model Role to input. 

This is done using a Data Set Attributes tool. 

Add a Data Set Attributes node to the diagram as shown. 




Assessment 



Consolidation Data Set 
Tree Attr ibutes 



Open the Data Set Attributes node. The Data Set Attributes window opens. 



f£Data Set Attributes 
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The Data tab lists three data sets exported from the Consolidation Tree node. The 
first is the Outtree data set generated by the SAS procedures underlying the Tree 
node. The second and third are the training and validation data sets. 
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Select the training data set (second from the top) and select the Variables tab. 



Jf^Data Set Attributes 
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The Variables tab displays the current metadata settings for the training data. You 
can change these settings by right-clicking in the one of the white columns. 



• Scroll the variables list to show the variable called NODE . 



gData Set Attributes 
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The Consolidation Tree model assigns each case to a leaf or node. The _NODE_ 
variable identifies this leaf. You can use this variable as a consolidation of the 
original CLUSTER_CODE input. 

By default, Enterprise Miner assigns a Model Role of group to the _NODE_ 
variable. You must change its role to input. 

• Right-click on the Model Role column for CLUSTER_CODE and select 
Set new model role ■=> input . 

• Similarly, change the model role of CLUSTERCODE to rejected. 

• Close the Data Set Attributes window. 

Now see whether the newly created input is useful enough to be selected in the regression 
model. 

• Connect a Regression node to the Data Set Attributes node. Label the node 
Consolidation Regression. 
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DATA4520. 
PVA_RAU_ 
DATA 
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Regress i on 



Asses sment 



Consolidation Data Set 
Tree Attributes 



Consol idat ion 
Regress ion 



• Open the Consolidation Regression node and verify the input _NODE_ has been 
added to the variables list. 

• Select the Selection Method tab and select the stepwise method. 

• Close the Linear and Logistic Regression window and save the changes. Name the 
model Consolidate. 

• Run the Regression node and view the results. 

• The overall average profit on the validation data is higher than the other standard 
regression model. 

• Select the Output tab and scroll to the bottom of the report. 

Not only is _NODE_ selected as an input, cases in the left branch of the Consolidation 
Tree (node 2) are 21% less likely to respond than cases in the right branch (node 3). 
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Multi- dimension data analysis apparatus for business activities, 
determines dimension variable subset for dividing input data and 
generates report using divided input data 
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Inventor: CHAPMAN T K ; CHU C R ; TIDEMAN S C 
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Abstract (Basic) : US 20020099581 Al 

NOVELTY - A database stores input data (32) having dimension 
variables and one target variable. A decision tree positioning module 
(38) determines a subset of dimension variables for dividing input 
data. A multi- dimension viewer (54) generates a report using the 
divided input data. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is included for multi- 
dimension data analysis method. 

USE - For understanding outcomes of business activities such as 
enterprises . 

ADVANTAGE - Defines market segments in a way that is most 
meaningful for understanding the outcomes of business activities given 
the large volume of data collected and maintained by an enterprise. 

DESCRIPTION OF DRAWING (S) - The figure shows a diagram of 
components of the data analysis apparatus. 

Input data (32) 

Decision tree positioning module (38) 
Multi- dimension viewer (54) 
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Patent No Kind Date Applicat No Kind Date Week 
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Patent Details: 
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Abstract (Basic) : US 20020099581 Al' 

NOVELTY - A database stores input data (32) having dimension 
variables and one target variable. A decision tree positioning module 
(38) determines a subset of dimension variables for dividing 
input data. A multi-dimension viewer (54) generates a report using the 
divided' input data. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is included for 
multi-dimension data analysis method. 

USE - For understanding outcomes of business activities such as 
enterprises . 

ADVANTAGE - Defines market segments in a way that is most 
meaningful for understanding the outcomes of business activities given 
the large volume of data collected and maintained by an enterprise. 

DESCRIPTION OF DRAWING (S) - The figure shows a diagram of 
components of the data analysis apparatus. 

Input data (32) 

Decision tree positioning module (38) 
Multi-dimension viewer (54) 
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WAITS T 
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US 5721831 A 60 G06F-017/60 

Abstract (Basic) : US 5721831 A 

The appts includes a device for generating a first display which 
lists selected goals of a bank that are termed PROJECTS, and allows a 
user to select a PROJECT. Displays buttons are used for activate 
options, which operate on data associated with the selected PROJECT. An 
OPEN. button generates a second display containing a list of market 
SEGMENTS , comprising subsets of customers to whom the bank markets 
products, and to which the selected PROJECT is directed. A DESCRIPTION 
button generates a description of the selected PROJECT. 

The selected SEGMENT may includes an ANALYSIS button, which 
generates a window allowing the user to retrieve stored data concerning 
individual members of the selected SEGMENT . A STRATEGY button generates 
a window allowing the user to view STRATEGIES associated with the 
selected SEGMENT. The STRATEGIES indicates behaviours sought to be 
induced in members of the selected SEGMENT. A CAMPAIGN button generates 
a window listing actions to be taken in pursuit of a strategy. 

ADVANTAGE - Simple to user, while allowing market analyst of bank 
to divide customer database into segments and examine response of 
selected segments to marketing strategies. 
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Abstract (Basic) : WO 200425501 A2 

NOVELTY - The analysis method has a statistical probability model 
(112) provided from the useful data (110), for allowing statistical 
analysis (120) of the useful data using a data mining method, a 
clustering method, an association rules method or a decision tree . 

DETAILED DESCRIPTION - Also included are INDEPENDENT CLAIMS for the 
following : 

(a) a device for analysis of useful data structured as a databank; 

(b) a computer program product with a memory medium storing a 
computer program for analysis of useful data structured as. a databank; 

(c) a computer-readable memory medium storing a computer program 
for analysis of useful data structured as a databank; 

(d) a computer program with program codes for analysis of useful 
data structured as a databank; 

(e) a computer program product with program codes for analysis of 
useful data structured as a databank stored on a machine-readable 
carrier 

USE - The method is used for analysis of useful data structured as 
a databank, e.g. customer or product data for customer relationship 
management, supply chain management, or marketing strategy 
management . 

ADVANTAGE - Method allows analysis of large quantity of useful 
data. 

DESCRIPTION OF DRAWING (S) - The figure shows a schematic diagram of 
the functioning of an analysis system for analysis of customer data. 
(Drawing includes non-English language text). 
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Statistical probability model (112) 
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Abstract (Basic) : US 20040002981 Al 

NOVELTY - A support for each state in high-cardinality attribute, 
is determined, when a node with associated data set is considered for a 
possible split and the attribute is considered as input or output 
attribute. The usable states of high-cardinality attribute are selected 
based on the support. 

. DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(1) computer readable medium storing instructions for handling 
high-cardinality attribute in decision tree ; and 

(2) computer device for using high-cardinality attribute. 

USE - For handling high-cardinality attribute in decision tree 
of computer device (claimed), for controlling customer attribution, 
performing credit-risk management, detecting fraud or making decisions 
on marketing . 

ADVANTAGE - By utilizing only the most popular states of the 
high-cardinality data and ignoring other states, the associated cost, 
power and time are reduced. 

DESCRIPTION OF DRAWING (S) - The figure shows a flow diagram 
explaining the procedure for handling high-cardinality attribute in 
deci s i on tree . 
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Abstract (Basic): US 20040002879 Al 

NOVELTY - An interestingness score is determined for each output 
attribute based on a difference between the entropy of the output 
attribute E(A) and a most favored entropy value (M) . A predetermined 
number of output attributes with highest interestingness scores are 
selected for use in decision trees . 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(1) method for selecting output and input attributes; 

(2) method for selecting input attributes; 

(3) computer readable medium comprising computer executable 
modules; and 

(4) computer device. 

USE - For selecting output attributes to take decisions in business 
application such as marketing . 

ADVANTAGE - Reduces the number of output and input attributes by 
selecting only the highest value attributes thereby decreasing the 
memory space and processing time requirements and results in increase 
of the processing time efficiency. Also results in increased utility of 
the resulting tree. 

DESCRIPTION OF DRAWING (S) - The figure illustrates the flow diagram 
of the decision tree . 
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NOVELTY - The samples (115) corresponding to users identity, and 
features (120) corresponding to product, are loaded into the database 



application. The useful predictors (170) corresponding to portion of 
the features are defined using a decision tree of feature. The 
prediction results (180) are determined by processing the decision 
tree using the predictors. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following: 

(1) computer readable medium storing data processing program; and 

(2) data processing system. 

USE - For processing large data set containing samples and 
features, in retail, financial, communication and marketing 
organizations. 

ADVANTAGE - Allows to accurately process large data set, quickly 
and more efficiently. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram for 
data processing system. 
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Abstract (Basic) : WO 200376895 A2 

NOVELTY - Deriving an outcome predictor for a data set, where 
variables affect outcome for the data set, comprising generating basis 
functions for interactions among the variables for the data set using a 
flexible nonparametric tool; and applying a recursive partitioning 
methodology to the data set, using the generated basis functions, to 
produce the outcome predictor, is new. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for: 

(a) a system for deriving an outcome predictor for a data set, 
comprising a generating mechanism for generating the basis functions 
for interactions among the variables for the data set using the 
flexible nonparametric tool; and an application mechanism for applying 
a recursive partitioning methodology to the data set, using the 
generated functions, to produce the outcome predictor; and 

(b) a computer program product, comprising a computer usable medium 
having a control logic stored for causing a computer to derive the 
outcome predictor for a data set, where the control logic comprises a 
first computer readable program code for causing the computer to 
generate the basis functions; and a second computer readable program 
code for causing the computer to apply the recursive partitioning 
methodology to the data set. 

USE - The method is used for deriving an outcome predictor for a 
data set. The outcome predictor comprises a decision tree for a 
genetic mapping study used to determine gene and environment 
interactions. The outcome predictor comprises a decision tree for 
use as a mass marketing study for a product. It relates the genotypic 
information to treatment type{s) including an administered drug. The 
outcome predictor is used to determine a personalized treatment regime 
for an individual, where the individual has a disease and a genotype, 
where the outcome predictor comprises a decision tree containing a 
result for the genotype of the individual, having a disease, e.g. human 
immunodeficiency virus (HIV) , autism, AIDS, a blood disease, hepatitis, 
heart disease, diabetes, epilepsy, cancer, a mental disorder, a 
neurological disorder, liver disease, a urological disorder, a kidney 
disorder, or a congenital defect (all claimed) . It could be used to 
identify genetic factors that render individuals susceptible to a 
variety of inherited and acquired diseases, as well as to develop drug 
resistance profiles that result from treating these ailments. It can be 
used to sort out variables that lead to the development of autism. It 
can be employed to predict a single variable from variables in many 
different areas, including but not limited to the medical, behavioral, 
biologic, physical, engineering, and economic sciences, as well as in 
marketing and business. It is generally beneficial in deriving the 
relationship between one continuous outcome variable with many 
predictors . 

ADVANTAGE - The inventive method accurately predicts outcomes to 
problems having complex variables. It predicts treatment outcomes, e.g. 
drug response, for diseases involving numerous complex variables. It 
determines effectiveness of medical treatment (e.g., drug 
effectiveness) for particular conditions, e.g. diseases. It is usable 
to predict in clinical trials whether a subject is likely to be a 
placebo responder. It can overcome the identification problem by 
reducing the dimension of the parameter space and identifying important 
interactions . 

DESCRIPTION OF DRAWING (S) - The figure presents various components 
of a standalone system for deriving an outcome predictor for a data set 
having variables affecting outcome. 
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Abstract (Basic) : US 20030061228 Al 

NOVELTY - Decision tree system for use in data mining comprises 
an object oriented pattern recognition algorithms module comprising a 
decision tree system including four object oriented modules to 
respectively read the data, sort the data, determine the best manner to 
split the data according to some criterion, and to split the data. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for a 
decision tree method for use in data mining files containing objects 
having relevant features, which comprises recognizing patterns among 
the objects based upon the features, creating a decision tree 
system, reading the data using an object oriented module, sorting the 
data using an object oriented module of sorting is necessary, 
determining the best manner to split the data into subsets according to 
some criterion using an object oriented module, and splitting the data 
using an object oriented module. 

USE - The decision tree system is used in data mining utilizing 
a storage module and an object oriented linking module for linking the 
decision tree system and the storage module. It is used in 
astrophysics, detecting credit card fraud, assuring the safety and 
reliability of the nation's nuclear weapons, nonprolif eration and arms 
control, climate modeling, the human genome effort, computer network 
intrusions, revealing consumer buying patterns, recognizing faces, 
recognizing eyes, recognizing fingerprints, analyzing optical 
characters, analyzing the makeup of the universe, analyzing atomic 
interactions, web mining, text mining, multi-media mining, and 
analyzing data gathered from simulations, experiments, or observations. 
Data mining is useful for mining scientific data including astronomy, 
biology, chemistry, and remote sensing; business data including 
detecting credit card fraud, market -basket analysis, and customer 
retention; and engineering data including network intrusion detection, 
identifying damage in structures (e.g. bridges, airplanes, or 
buildings), identifying coherent structures in turbulent flow, 
optimization of engineering design. It can also be used in computer 
vision and military applications.. 

ADVANTAGE - The inventive system is scalable with increasing number 
of processors, making it well suited to mining massive data sets. 
Decision trees are simple to implement, yield results that can be 
interpreted, and have built-in dimension reduction. The invention 
employs evolutionary algorithms (EAs) which are not limited to 
considering one coefficient at a time unlike CART and OC1, and which 
find better splits than the simple greedy hillelimbers (sic) that are 
currently in use. EAs eliminate the need for optimal splits, have good 
scalability properties, uses problem-specific knowledge (i.e. reduce 
execution time using known 'good' solutions to seed the initial 
population) , exhibit tolerance to noise, and are implemented on 



parallel computers, thus providing promising expected performance 
improvements . 

DESCRIPTION OF DRAWING (S) - The figure is a flow chart illustrating 
that the data mining process is iterative and interactive, 
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Abstract (Basic) : WO 200350695 Al 

NOVELTY - The template (100) includes one or more normative 
statements and one or more indicator statements associated with each of 
the normative statements. A response to each indicator statement 
defines an indicator statement score. The template further has one or 
more headings and one or more categories associated with each of the 
, headings . A category score is computed by weighing and summing the 
normative scores under each of the headings. A heading score is 
computed via biasing the category scores via a GMI score and an 
asymmetric geometric scoring technique. An estimated risk score is 
computed based upon biasing the heading scores based upon a GMI curve. 

DETAILED DESCRIPTION - The GMI curve is a skewed normalized 
distribution curve. 

INDEPENDENT CLAIMS are also included for the following: 

(a) a scoring method to calculate risk; 

(b) a scoring methodology; 

(c) an article of manufacture. 

USE - For statistical analysis of marketing data. 



ADVANTAGE - Combines asymmetric and non-linear arithmetic scoring 
based on relative scores across a universe of entities. 

DESCRIPTION OF DRAWING (S) - The figure shows the organizational 
view of the research template. 

Template. (100) 
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Abstract (Basic) : US 20030105658 Al 

NOVELTY - A CPU (32) processes customer records such that the 
processed customer records are stored in a data warehouse server (12) . 
An online analytical processing ( OLAP ) profiling engine (16) 
builds and updates customer behavior profiles (44) by mining the 
customer records that flow into the server and derives similarity 
measures on patterns extracted from behavior profiles which are define 
as data cubes (42) . 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
customer behavior pattern comparison method. 

USE - For conducting customer behavior pattern analysis in 
telecommunication applications and electronic commerce applications fo 
fraud detection, personalized/targeted marketing and commercial 
promotion . 

ADVANTAGE - Enables application of OLAP based solutions to 
probability oriented, scalable profiling and multilevel, 
multidimensional pattern analysis and comparison so that extracted 
patterns can be used to provide guidelines when making business 
decisions such as service provisioning, performing trend analysis, and 
detecting abnormal behavior. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
the customer profiling apparatus. 

data warehouse server (12) 
OLAP profiling engine (16) 

CPU (32) 

data cubes, (42) 

customer behavior profiles (44) 
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Data mining apparatus for industry application, analyzes stored data 
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Abstract (Basic) : US 20020161760 Al 

NOVELTY - A data collector (208) collects data from several client 
industrial systems and stores the collected data in a data warehouse 
(210). An online analytical processor (212) analyzes the stored 
data about the industrial systems and presents analysis results to the 
user through a user-interface. 

USE - For mining data and providing service using Internet, 
Ethernet, intranet, LAN for industrial applications, marketing 
research, scientific research, economics, criminology and other fields. 

ADVANTAGE - The user can access best practice information. Thus 
confidentiality to the users is provided and user is encouraged to 
return to web site frequently. The user is able to view current 
equipment performance and evaluate past performance trends reliably. 

DESCRIPTION OF DRAWING (S) - The figure shows a flow diagram of the 
data mining apparatus . 

Data collector (208) 

Data warehouse (210) 
Online analytical processor (212) 
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Abstract (Basic) : WO 200159674 Al 

NOVELTY - An associating device relates digital images of products 
with statistical information. A calculating device computes information 
from the statistical information of each product. The statistical 
information, computed information and digital images are associated 
with each product and recorded in a storage device. 

USE - For on=line marketing , merchandising and promotion 
planning . 

ADVANTAGE - Uses single and easy-to-use statistical analysis tool 
to view data front current and past on-line and print promotions, allow 
user to enter product picture, assign a rating, includes comments and 
route product information to other users to acquire comments and 
feedback. Enables to integrate on - line analytical processing 
tool and perform sophisticated and multidimensional analysis of 
warehoused data. 

DESCRIPTION OF DRAWING { S ) - The figure shows a diagram of the 
infrastructure of an information analyzing system. 
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Abstract (Basic) : WO 200111497 Al 

NOVELTY - Each data location in multidimensional database (MDB) is 
specified by integer encoded business dimensions associated with data. 
Address data mapping unit maps integer coded MDB dimensions against 
integer encoded data storage address within memory associated with MDB 
using modular arithmetic function. Data accessing unit accesses data 
element in memory using map information. 

DETAILED DESCRIPTION - A parallel computing platform has processors 
and memories for storing data elements in integer encoded address. 
INDEPENDENT CLAIMS are also included for the following: 

(a) Data element accessing method; 

(b) Data element management system; 

(c) Data element management method; 

(d) Internet URL directory system; 

(e) Internet enabled system 

USE - For accessing multidimensional database (MDB) such as data 
warehouse in business organization for on - line analytical 
processing , MDB is used in on-line e-commerce shopping system for 
storing consumer shopping profile information, for URL directory system 
used for data mixing in internet, and other MDB based system used for 
predictive business modeling for applications such as database 
marketing , financial/risk analysis, fraud management, bioinf ormatics , 
return-on-investment justification, business intelligence application, 
customer relation management, enterprise information portals and 
systems used for supporting real-time control of packet routers, 
switches and other devices used in internet, for real-time control of 
automated parcel routing and sortation system. 

ADVANTAGE - Improved data accessing is provided by parallel 
computing platform. Inter process communication among parallel 
processors is minimized. Fast, affordable and easy access is provided 
to customer enabling companies to more effectively market products 
and service over internet. Supporting real-time control of processor in 
response to complex states of information reflected in MDB. 

DESCRIPTION OF DRAWING (S) - The figure shows the schematic 
representation of data element address assignment method. 
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English Abstract 

The present system includes the aggregation, anonymization (150), 
analysis, and dissemination of information including - susceptibility, 
progression, severity of disease (113,115,123,131), the resources 
utilized to treat those diseases (107), quality of life of patients who 
have those diseases, ability to participate in the workforce and, 
ultimately, survival. This information, when analyzed for patients along 
with relatives of those patients, provides an understanding of why 
genetically similar or identical patients express diseases differently 

(137) . Collecting the information and linking that information to 
genotypic information provides a determination of role the genetic makeup 
of an individual plays in disease contraction, treatment and outcome 

(150, 151, 153, 157) . 

Main International Patent Class: G06F-017/60 
Fulltext Availability: 
Detailed Description 

Detailed Description 

... 40 may be integrated with hospital 10 and customer segments 72 via 
direct linking of data warehouse 150, local outcomes data 
warehouse 14, and clinical outcomes research 22. Customer segments 72 
may include pharmaceutical, biotechnology, genomics or other third party 
business . Customer segments 72 may also include applications 35 such 
as drug design, discovery and marketing divisions , patient management 
insights, and e-health content (FIG 5c) . In one embodiment vendor 40 may 
use data mining application 41 to mine data warehouse 150 and 
apply the information gained therefrom to insights developments 42. 
Insights 62 et seq. . . 



.companies 76, care providers 76, and patients 101 (FIG. 5d)- Care 
providers/payers may access data warehouse 150 via web based Internet 
connections 25. 

Referring now to FIGS. 6a-6c, hospital 10... 
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Claims 

Claim 

... 160 are met, bid 154 is subjected to a simulated 

bid opening analysis 161 to predict whether the bid can be expected to 
be a winning bid. An outcome of a sealed bid auction depends on sizes 
of the bids received from each bidder... 

...not know the bids placed by other bidders until the bids are opened, 
making the outcome of the auction uncertain. By placing higher bids, a 
probability that the auction will be... 

...for example a Monte Carlo analysis, many scenarios are simulated to 
produce a distribution of outcomes . The distribution of outcomes 
include a probability of winning the auction item(s) and the value gain. 
By varying. . . 

...of market rules and contracts into computerized business rules, 

codification of potential competition/market forces, forecasted budgets 
and priorities into a preference matrix, one's own bidding capacity, 



preferences, risk/return ... established criteria 80 and selected data 78 
as to third portion or remainder 42 and divides third portion 42 into 
portions 4 6, and then further divides each portion 4 6 into categories 
48 and 50 and category 50 into clusters 52, 54 and clusters 52, 54 into 
subclusters 56, 58, 60, 62 and 64 using criteria 80 imported from 
database 76 and each of processes 206 and 208. Individual asset 
valuations are established for the... 

.subclusters 56, 58, 60, 62 and 64 by statistical inference. 
The individual asset valuations are listed in cluster tables 136 (see 
Figure 3) and after adjustment 138, listed in a credit analyst table 140. 
The established criteria 80 are objective since criteria 80 come from 
database 7 6 where they have been placed during full underwriting 
procedure 14 and sample underwTiting procedure ... credit analyst table 140 
and untouched asset table 144 for all assets is placed into database 76 
in a digital storage device, such as the hard disk storage 178 of 
computer. . . 

.increase the accuracy of statistically inferred valuation 142 by 
correlating to established criteria 80 in database 76 on assets in 
fully underwritten first portion 16 and assets in sample underwritten 
second. . . 

.to selected data 78 on assets in portions 16 and/or 36 are located in 
database 76 and then by statistical inference, a value for each asset in 
third portion 42 a portfolio with a forecasted cash flow 
recovery may be evaluated by a number of valuation techniques. The 
typical objective... 

.are ranked in order of their capability to accurately quantify cash 
flow, or cash equivalent, forecasts with the least downside variances 
and/or maximum upside variances. The asset is valued by... 

. s valuation once the best method has been employed. In order to provide 
the best forecast of asset value, assets are evaluated by each method 
within a food chain until such... a valuation to the raw data and this 
rule set is coded into the valuation database in the form of criteria 
80. Each time a cluster is touched by multiple hits during a valuation in 
procedures 14, 34 or 40, a consensus forecast is developed and applied 
to the cluster. In accordance with system 28, the probability 
distributions ... applicable, such as by way of example without limitation, 
legal climate, gross domestic product ("GDP") forecast , guarantor 
climate, collections efficiency, borrower group codes, and the like. 
One method for sampling a... 34, 763 12A23.821 44.160,329 27.5% 30.810 
The appropriate variance adjusted forecast is made for each asset and 
the valuation tables are constructed to include every asset... 

.capital, plus FX swap cost, plus risks in general uncertainties inherent 
in the variances of forecasted cash flow recovery. If it appears that 
there is more than a five-percent certainty. . . 

.probability of maximum upside probabilities is even more attractive to 
investors . 

The aggregated portfolio is divided into separately marketable sub 
portfolios or tranches. Each tranch has a forecasted cash flow 
probability distribution and time duration from prior analytics. These 
tranches are then given... are sampled 242 according to risk. Second, 
assets are underwritten 24 4, and valuations recorded. Third, market 
value clusters are formed 24 6, such as by FCM, as described below. 
Fourth, regression models are built . . . 

. inf erentially valued portion 42 of portfolio 12 in a manner weighted by 
the counts to predict individual values for each of the 
non-underwritten assets. The individual asset values produced according 
previous appraisal amount, market value cluster ( predicted from 
previous appraisal amount, land area, building area, current appraisal 
amount, court auction realized price. . . 

.a notion of worth to collateral assets. The underwritten valuations are 



stored in a master database table, such as database 7 6 (shown in 
Figure 2) . Valuations are typically summarized in terms of monetary units 
(e. . . 
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ABSTRACT EP 88774 9 A2 

A method for detecting a selectable number of groups of objects having 
at least one selectable characteristic from a population of objects 
specifiable by a plurality of attributes comprising the following steps: 

- subdividing the objects of the population into object groups of a first 
order, on the basis of respectively at least one attribute, 

- detecting the quality of each object group of this order, on the basis 
of the total number of its objects and the number of its objects having 
said at least one characteristic and/or the number of its objects not 
having said characteristic, 

- for each object group, including the object group of this order into 
the number of object groups to be detected, if the object group has a 
quality higher than the lowest quality of the object group among the 
object groups detected up to this point, 

- for each object group of this order, detecting, at least one 
hypothetical first quality on the basis exclusively of those objects of 
this object group which have at least one of said characteristics, and/or 
at least one hypothetical second quality on the basis exclusively of 
those objects of this object group which do not have at least one of said 
characteristics, wherein the at least one of the first or second 
qualities is a quality of a hypothetical group derived from the actually 
processed group and comprising exclusively the objects thereof having one 
or not having one of said at least one characteristics, and 

- subdividing all those object groups of this order the at least one of 
the respectively assigned first or second hypothetical qualities of which 
comprise a selectable quality value and particularly are of the quality 
of the respective object groups, into object groups of the next lower 
order, by selecting at least one attribute. 

ABSTRACT WORD COUNT: 304 
NOTE: 

Figure number on first page: 11A 

LANGUAGE { Publication , Procedural , Application ) : English; English; English 
FULLTEXT AVAILABILITY: 

Available Text Language Update Word Count 
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Total word count - document A 11059 
Total word count - document B 0 
Total word count - documents A + B 11059 
INTERNATIONAL PATENT CLASS: G06F-017/30 ... 

. . . G06F-009/44 

...SPECIFICATION introducing a number of space-saving abbreviations for 
field, names and field values) . 

From such data , traditional data mining software can produce 
different kinds of knowledge: decision tree algorithms induce models 
that predict whether a certain customer will reply to a future mailing, 
clustering algorithms segment our customer base into homogeneous groups 
that can be treated together in marketing campaigns, etc. 

1.2 Problems with the single-table assumption 
Furthermore, since we can add. . . 



15/3,AE,K/7 (Item 6 from file: 349) 

DIALOG (R) File 34 9:PCT FULLTEXT 

(c) 2004 WIPO/Univentio. All rts. reserv. 

00989416 

METHOD AND SYSTEM FOR PLACEMENT, MONITORING AND MEASUREMENT OF INTERACTIVE 
ADVERTISING 

PROCEDE ET SYSTEME DE MISE EN PLACE, DE SURVEILLANCE ET DE ME SURE D'UNE 
PUBLICITE INTERACTIVE 

Patent Applicant /Assignee : 

KENT RIDGE DIGITAL LABS, 21 Heng Mui Keng Terrace, Singapore 119613, SG, 
SG * {Residence) , SG (Nationality), {For all designated states except: 
US) 

Patent Applicant /Inventor : 

PADMANABHAN Ramanath, Block 506, #08-219, West Coast Drive, Singapore 

120506, SG, SG (Residence), IN (Nationality), (Designated only for: US) 
SITARAM Ranganatha, Block 218, Choa Chu Kang Central, #02-250, Singapore 
680218, SG, SG (Residence), IN (Nationality), (Designated only for: US). 
Legal Representative : 

KANG Alban (et al) (agent), Alban Tay Manhtani & De Silva, 39 Robinson 
Road, #07-01 Robinson Point, Singapore 068911, SG, 
Patent and Priority Information (Country, Number, Date) : 

Patent: WO 200319444 Al 20030306 (WO 0319444) 

Application: WO 2001SG169 20010823 (PCT/WO SG0100169) 

Priority Application: WO 2001SG169 20010823 
Designated States: 

(Protection type is "patent" unless otherwise stated - for applications 
prior to 2004) 

AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ 
EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR 
LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL 
TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW 

(EP) AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR 

(OA) BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG 

(AP) GH GM KE LS MW MZ SD SL SZ TZ UG ZW 

(EA) AM AZ BY KG KZ MD RU TJ TM 
Publication Language: English 
Filing Language: English 
Fulltext Word Count: 7144 

English Abstract 

A decision support system for an interactive advertising system, the 
decision support system being for collecting and analysing data obtained 
through interaction by at least one user with the interactive advertising 
system, the user using a user's machine for the interaction, and 
measuring the effectiveness of the advertisement. Also disclosed is a 
method for placement, monitoring and measurement of an interactive 
advertising system having an interactive advertising device for 
displaying at least one advertisement, the method including the steps of 



specifying the goals of the at least one advertisement; placing the at 
least one advertisement using data obtained from previous advertising 
using the interactive advertising system; monitoring user interactivity 
with the interactive advertising system as a result of the at least one 
advertisement so as to collect data; and using the data to determine the 
effectiveness of the at least one advertisement. 

Main International Patent Class: G06F-017/6o' 
Fulltext Availability: 
Claims 

Claim 

... of the data collection and analysis systems and methods described and 
claimed herein, The demographics database is a summary of the above two 

databases with respect to the lifestyles, interests, habits, and 
behaviour of the users/respondents. This data may also contain certain 
estimates, predictions and rules developed by a variety of data mining 

tools acting on the interaction data, lo Figure 4 shows the database 
schemes for the interaction database given an interactive advertisement 
in a bus. The database is implemented as a relational database so 
that it may be easily queried, and made accessible for subsequent 
analysis and measurement. The marketing and demographics databases 
are mainly derived from the interaction database using existing data 
mining techniques and tools. Some methods that are specific to 
interactive advertisement are described below. Interaction ... the present 
invention, the objects to be classified are generally represented by 
records in a database , and the act of classification consists of 
updating each record by 

completing a field with ... limited number of classes, and any record may 
be applied to any one of them. Decision trees and memory-based 
reasoning are techniques well suited for classification; and link 
analysis may, in... the population segments that have responded to similar 
advertisements in that cluster in the past. Market basket analysis, 
memory-based reasoning, decision trees , and artificial neural 
networks are suitable for use in prediction. The choice of the technique 
perform this analysis, all the data is grouped into market baskets 
including all the advertisements to which each viewer has responded. 
Based on these baskets.,.. 
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English Abstract 

A technique to produce a marketing campaign is described. The technique 
scores a data set of prospects using a plurality of models that estimate 
state transition probabilities for the prospects, with the models based 
on samples of potential contacts and their responses, and scores the data 
set with a plurality of valuation models to determine rewards gained from 
the prospects in the data set. The model combines the probability of the 
event occurring for the prospects and values of the prospects to provide 
targetability value estimates for the prospects by using Markov decision 
processes from outputs of the scoring and solves the Markov decision 
processes for the prospects. 

Main International Patent Class: G06F-017/60 
International Patent Class: G06F-017/30 
Fulltext Availability: 
Detailed Description 

Detailed Description 

... COMBINING VALUE AND PROBABILITY MODELS TN DATABASE MINING" 
BACKGROUND 

This invention relates generally to data mining 
software . 

Data mining software extracts knowledge that 
may be suggested by a set of data. For example, data 

mining software can be used to maximize a return on 
investment in collecting marketing data, as well as other 
applications such as credit risk assessment, fraud 
detection, process control, medical diagnoses and so 
forth. Typically, data mining software uses one or a 
plurality of different types of modeling algorithms in 
combination with. . . 

...rate, behavioral response or other 

output from a targeted group of individuals represented 
by the data . Generally, data mining software executes 
complex data modeling algorithms such as linear 
regression', logistic regression, back propagation neural 
network, Classification and Regression Trees (CART) and 
Chi squared Automatic Interaction Detection (CHAD) 

decision trees , as well as other types of algorithms on a 
set of data. 

SUMMARY 

According to . . . 
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English Abstract 

A method executed on a computer for modeling expected behavior is 
described. The method includes scoring records of a dataset that is 
segmented into a plurality of data segments using a plurality of models 
and converting scores of the records into probability estimates. Two of 
the techniques described for converting scores into probability estimates 
are a technique that transforms scores into the probabilities estimates 
based on an equation and a binning technique that establishes a plurality 
of bins and maps records based on a score for the record to one of the 
plurality of bins. 

Main International Patent Class: G06F-017/30 
Fulltext Availability: 

Detailed Description 
Detailed Description 

EXECUTION OF MULTIPLE MODELS USING DATA SEGMENTATION 

BACKGROUND 

This invention relates generally to data mining software. 

Data mining software extracts knowledge that may be suggested 
by a set of data. For example, data mining software can be used to 
maximize a return on investment in collecting marketing data, as well 
as other applications 1 5 such as credit risk assessment, fraud 
detection, process control, medical diagnoses and so forth. Typically, 
data mining software uses one or a plurality of different types of 
modeling algorithms in combination with. . . 

...rate, behavioral response or other output from a targeted group of 
individuals represented by the data . Generally, data mining 
software executes complex data modeling algorithms such as linear 
regression, logistic regression, back propagation neural network, 
Classification and Regression (CART) and Chi2 (Chi squared) Automatic 
Interaction Detection (CHAID) decision trees , as well as other types 
of algorithms on a set of data. 
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ABSTRACT EP 1315110 A2 

A system, including a planning module, a control module and a receiver 
module, configured to schedule display of advertisements to achieve an 
advertising impression goal. The planning module enables scheduling of 
advertising impressions in accordance with target criteria. Further, the 
planning module enables selecting an advertising impression goal for 
advertisements, assigning an advertising type and defining a weight for 
the advertisement. The control module receives the schedule, the' 
advertising type and the defined weights and generates one or more 
metadata files that contain target criteria, advertising type and weights 
for the advertisement. The one or more metadata files and advertising 
content for the advertisement, are delivered to the receiver module that 
is configured to define a display frequency for the advertising content 
based upon one or more of the metadata files. The receiver module 
selectively displays the advertising content of the advertisement to 
achieve the advertising impression goal. 

ABSTRACT WORD COUNT: 14 9 
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LANGUAGE { Publication , Procedural , Application ) : English; English; English 
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CLAIMS A (English) 200322 1470 

SPEC A (English) 200322 17455 
Total word count - document A 18925 
Total word count - document B 0 
Total word count - documents A + B 18925 

...SPECIFICATION Illustratively, overall advertising inventory module 50 
receives data from date dimension 62, time dimension 64, marketing area 
dimension 66, and ad space dimension 68. The data stored within overall 
advertising inventory. . . 

...50 is representative of any combination of values of date dimension 62, 
time dimension 64, marketing area dimension 66, and ad space dimension 
68. Each combination of values is unique for... 



.The particular embodiment of data module 34 will be discussed with 
respect to a relational database ; however, one skilled in the art can 
appreciate that data module 34 can store the... 
.variety of other structures, such as but not limited to, a 
multidimensional data cube, an OLAP data store, or the like. 
As shown, module 50 includes a total inventory attribute 70... 
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English Abstract 

A system and method for creating a unique alias associated with an 
individual identified in a health care database such that health care 
data, and particularly pharmaceutical-related data, can be efficiently 
gathered and analyzed. The system has a first data store (302) for 
storing at least one record where each record includes a plurality of 
identification fields which when concatenated uniquely identify an 
individual, and at least one health care field corresponding to health 
care data associated with individual . The system also has a second data 
store (304), and a processor. The processor selects a record of the first 
data store, then selects a subset of the plurality of identification 
fields within the selected record, and concatenates the selected subset 
of identification fields. Then the processor stores the concatenated 
identification fields in a record in the second data store with the at 
least one health care field from the selected record of the first data 
store . 



Fulltext Availability: 
Claims 

Claim 

. . . to segregate the claims data, it becomes much harder to generate 
valuable research and market data based upon the unique attributes 
for specific individuals, such as age, gender and geographic 
distribution. It is therefore desirous to provide the ability to 
efficiently gather information from the claims databases to allow 
research and analysis of the attributes that effect the pharmaceutical 
industry. Accordingly, the... 

. . .method for creating a unique alias associated with an individual 

identified in a health care database , that allows the aggregation of 
segregated data for marketing research. The system may include a first 
data store for storing at least one record. . . 

...first data store and the second data store can either be located within 
the same database or in separate databases . 

The health care data stored within the first data store may, in one 
embodiment, correspond. . . 

. . .pharmaceutical claims data. The selected subset may correspond to a 

specific person in the healthcare database , and the person's last name, 
birthday, and gender are concatenated to forin a unique ... method for 
creating a unique alias 

associated with an individual identified in a health care database , 
wherein the health care database stores at least one record, and each 
record has a plurality of identification fields which. . . 
...the individual. The method includes the steps of selecting a record 

within the health care database , selecting a subset of the plurality of 
identification fields within the selected record, concatenating the... 

. . .of identification fields, and storing the concatenated identification 
fields in a record in a second database with the at least one health 
care field from the selected record of the first... 

...fields and the at least one health care field of each record of the 
second database . The step of selecting a record within the health care 
database may comprise 

selecting a record from pharmaceutical claims data. Further, the step of 
concatenating the... 

. . .of identification fields may comprise, for example, concatenating, for a 
specific person in the healthcare database , that person's last name, 
birthday, and gender. Thus, based on the concatenated identification 
fields . . . 

...records into a data cube. The step of selecting a record within the 
health care database may comprise selecting records of the first data 
store that are in tabular form, and. .. population identifiers allow users 
to follow patients over time yielding important results unavailable in 
other databases , such as patient drug switching behavior. By linking 
medical and phan-nacy transactions at the... 

...can be determined. The report displayed by the system may contain 
several attributes, such as: market shares geographic information at 
the national, regional, state and MSA levels; trends over time including 



.number of ways to help make business decisions, such as monitorinor new 
drug launches and * marketing campaigns, enhanced sales force targeting, 
and micro- marketing in select geographic areas or to select customers. 
Furthermore, the system can be used for forecasting and development of a 
pharmaceutical marketing strategy including indication-specific product 
positioning, early warning market share shifts, clinical trial site 
selection, investigator recruiting, and accurate intelligence on market 



size and demand. Other objects, features, and advantages of the present 
invention will become apparent ... identifier , encrypting this identifier 
and removing specific patient identifying fields. Data is then loaded 
into database tables (such as an Oracle database ) at step 104 that 
also reside on SITE 2. At step 105, SITE 2 runs... 

.for analyzing and consolidating the data and for transf on-ning the 
resulting Oracle tables into OLAP cubes. The cube building process ma 
run on a different computer (such as SITE 2) . 

Cubes are modeled using an OLAP product on a desktop computer under, 
for 

example, the Windows NT operating system. The cube... 
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07251836 Supplier Number: 61621900 (USE FORMAT 7 FOR FULLTEXT) 
Supersized databases - Vendors are loading their offerings with advanced 
goodies. (Product Announcement) 

McCright, John S. 
PC Week, pl4 
April 24, 2000 

Language: English Record Type: Fulltext 
Article Type: Product Announcement 
Document Type: Magazine/ Journal ; Trade 
Word Count: 632 

users will really dig data mining 
From PC Week Labs: Microsoft Corp. has upped the database ante 
again, including data mining features in its upcoming SQL Server 2000 
database , due by midyear (Beta 2 code ships this week) . No other database 

vendor has done this, and Microsoft's accessible data mining 
interfaces and easy-to-use administration tools make this complex 
technology approachable. Using SQL Server 2000' s data mining engine, PC 
Week Labs drilled down to find the most important factors affecting 
specific criteria or to perform market segmentation analyses to ■ 
personalize Web sites. Data mining information is available through 
either a relational or an online analytical processing interface. Four 
decision trees (see screen) and two clustering data mining 
algorithms are provided. This release also includes distributed query 
features for scalable performance across shared-nothing clusters of 
servers. For databases that can be divided into logical groupings , 
this feature is going to provide incredible speed boosts. Look for a full 
review of . . . 
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00776946 

Satellite Software: Better Management With Fewer Hands 

Wireless Insider 

October 1, 2001 VOL: 2 ISSUE: 37 DOCUMENT TYPE: NEWSLETTER 
PUBLISHER: PHILLIPS BUSINESS INFORMATION 

LANGUAGE: ENGLISH WORD COUNT: 3002 RECORD TYPE: FULLTEXT 

(c) PHILLIPS PUBLISHING INTERNATIONAL All Rts. Reserv. 

TEXT: 

...operator can customize the look and feel of his 

or her operation. "A super-macro decision tree , for example, allows the 
operator 

to set up decisions in every command level on the... make our software as 
versatile as possible," Graham says. 

"With Crystal, we can address smaller market segments such as studio 
management 

in broadcast facilities where scheduling and ancillary equipment has to be 
. . .whole M&C sector as an evolving entity where there is no end to the 
list of new 

features and new products. 
His company's customers, which include DirecTV Inc. f s Los Angeles 
Broadcast ... of our monitoring and control software. 

The .Net platform will provide a rich set of database support, higher 
level 

security, and a more flexible GUI," says Gray who describes customers as... 



. desk 



and SLA functions. Other functions such as interfaces to Oracle, and MS 
ODBC 

(Open Database Connectivity) are also part of the requirements." 

"As satellite becomes an integral part of larger. . . 
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Web-Enabled CRM. (Buyers Guide) 
Computer Telephony, v8, n4, p52 
April, 2000 

Language: English Record Type: Fulltext 
Article Type: Buyers Guide 
Document Type: Magazine/ Journal ; Trade 
Word Count : 44 79 

data from front office CRM apps like Siebel and Clarify, as well as 
back office databases , e-commerce, and ERP systems, and uses it to create 

OLAP reports and structure marketing campaigns. In addition to basic 
data mining , E.piphany has a long list of sophisticated features 
that encompass benchmarking, trend forecasting, outbound campaign creation 
and management, and realtime targeted marketing. A... 

...this work focuses on evaluating the performance of existing sales and 
marketing efforts, identifying key market drivers, and segmenting one's 
own customer base according to needs and opportunity. 
E.piphany recently scored a... 
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PointClear (TM) Study Shows Companies are Shifting Their Marketing Mix in 
2005 to Improve Lead Generation and Qualification 

PR NEWSWIRE (US) 
November 16, 2004 

JOURNAL CODE: WPRU LANGUAGE: English RECORD TYPE: FULLTEXT 
WORD COUNT: 7 02 

(USE FORMAT 7 OR 9 FOR FULLTEXT) 

opportunities, provides sales teams with ready buyers, and verifies 
results. PointClear 's comprehensive portfolio of outcome -based marketing 
services includes: building/optimizing marketing databases , 

multi-channel prospecting programs, in-depth prospect profiling, 
fulfillment and e-fulf illment , list segmentation , target market 

intelligence, closed-loop Sales Opportunity Management, ROI analysis, and 
much more. PointClear, LLC 
CONTACT: Denise. . . 
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Silver in the data mine 

Cormier-Chisholm, James 
Futures v30nl3 PP: 42-45 Oct 2001 
ISSN: 074 6-24 68 JRNL CODE: CMM 
WORD COUNT: 167 3 

ABSTRACT: Market analysis, at its most fundamental level, is pattern 
recognition. Advanced data mining technologies are an efficient way of 
arriving at short-term forecasts and of providing insights into the role 
of predictive and explanatory variables. Data mining techniques like 
Cart and Mars have an important advantage over neural net techniques: They 
apply output rules and predictive models transparent to the investor. For 
example, when investigating the silver market , investors can examine 
decision trees , such as those produced by Cart and multivariate 

regression formulas produced by Mars, and determine... 

...TEXT: result of long-term economic/political events or of short-term 
technical and speculative influences. Market knowledge of predictor 
variables gives traders a better grasp of how long they should carry a 
trade. 

DO. . . 
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01397693 00-48680 

Predictive modeling for non- statisticians 

Abies, Geoffrey 

Target Marketing v20n3 PP: 114-116 Mar 1997 
ISSN: 0889-5333 JRNL CODE: ZIR 
WORD COUNT: 1204 

ABSTRACT: A few factors to consider when determining if predictive 
modeling will turn up new useful information for your database 
marketing are: 1. predictable behavior, 2. significant sample size, 3. 

exhaustion of simple targeting techniques, 4. presence of predictive 
information. A predictive model takes as input a listing of all 

individuals who have displayed the desired behavior... 

... then compares all the known characteristics about both of these groups. 
The result is a decision tree , or mathematical formula, that defines 
which characteristics are most descriptive in differentiating individuals 
with the . . . 
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Data warehouses, marts, metadata, OLAP/ROLAP , and data mining--a glossary 

Castelluccio, Michael 

Management Accounting v78n4 PP: 59-61 Oct 1996 
ISSN: 0025-1690 JRNL CODE: NAA 
WORD COUNT: 1083 

...TEXT: and MOLAP (multidimensional online analytical processing), as well 
as two-and three-tiered OLAP. 



Traditional databases are retrospective in design: What did we sell? What 
debt have we accumulated? Data mining is prospective: What will be the 
consequences of our debt in the next six months? Where is the best market 

for a new service? Data mining works by using modeling. A model is 
created out of current information, and then it is projected onto another 
situation where the information does not yet exist. It predicts using the 
reasoning tools of artificial intelligence: neural networks, decision 

trees , if-then rules, genetic algorithms, and the nearest neighbor 
method. What makes data mining so valuable is that it provides 

predictive analysis. In the process, asking the right question is crucial 
because the answer often reveals... 
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(c) 2004 Business Wire. All rts. reserv. 

00595886 2001 1004 277B84 34 (USE FORMAT 7 FOR FULLTEXT) 

SPSS BI Expands Global Presence With New Releases of Leading Data Mining 
Tools 3; Local-language versions of Clementine 6,0 and AnswerTree 3.0 now 
available 

Business Wire 

Thursday, October 4, 2001 09:00 EDT 

JOURNAL CODE: BW LANGUAGE: ENGLISH RECORD TYPE: FULLTEXT 
DOCUMENT TYPE: NEWSWIRE 
WORD COUNT: 726 

. . . zooming and 

printing, and an easier to use production mode. 
About AnswerTree 

AnswerTree is a data mining tool that creates decision trees for 
profiling 

groups and predicting how these groups will respond to marketing and 
sales 

offerings. AnswerTree' s advanced analytics describe how customer or citizen 
groups differ. Using that information, the software can then predict 
their 

tendencies to respond one way or another to promotions or programs. 
AnswerTree 

provides users with the most decision tree algorithms in one tool, a 
visual 

tree for understanding groups better, evaluation graphs for better 
understanding of model performance and a scalable architecture for mining 
large data sets. 

About Clementine 

Clementine, SPSS' enterprise-strength data mining workbench, helps 
businesses 

improve the profitability. . . 
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The business value of data warehousing. (Technology Information) 

Spencer, Tricia; Blahuta, Donnelle 
Enterprise Systems Journal, vl2, n6, p40(4) 
June, 1997 

ISSN: 1053-6566 LANGUAGE: English RECORD TYPE: Fulltext; Abstract 

WORD COUNT: 2801 LINE COUNT: 00245 



. . . processes can be more efficient by providing suppliers with access 

to current inventory information. 



Data Mining 

The data warehouse creates new opportunities for an organization 
that were simply not feasible with highly fragmented data. Advances in 
data mining can provide significant insights into a business. For 
example, early adopters in the financial and retail industries are using 
advanced data analysis for predicting customer behavior, fraud detection 
and market -basket analysis. The application of data mining techniques 
— such as neural networks, decision trees and data visualization -- 
will help organizations in all industries tap into the potential of the 
data warehouse . 

Mass Customization 

Opportunities in the ■ area of market segmentation to the point of mass 
customization. . . 
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Data mining today, (includes product directory and related article on 
current data mining implementations) (Buyers Guide) 

Brooks, Peter 
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Feb, 1997 

DOCUMENT TYPE: Buyers Guide ISSN: 1041-5173 LANGUAGE: English 

RECORD TYPE: Fulltext; Abstract 

WORD COUNT: 3540 LINE COUNT: 00293 

. . . available . 

XpertRule Profiler by Attar Software Ltd. uses a rule induction 
process to create a decision tree that identifies which factors affect 
the desired outcome . An easy-to-understand Decision Tree View shows 
the number of database records and frequency of the desired outcome in 
each decision tree node. WizRule from WizSoft Inc. uses a proprietary 
mathematical algorithm to discover every rule under investigation in a 
relatively short time. Angoss Software International Ltd.'s KnowledgeSeeker 
specializes in market segmentation and target marketing . 

The key benefit of rules-based data mining approaches is that they 
are relatively easy. . . 
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SPSS Data Mining Tools Lead KDnuggets Poll for Second Year in a Row. 

Business Wire, p2422 
Nov 26, 2001 

Language: English Record Type: Fulltext 
Document Type: Newswire; Trade 
Word Count: 84 7 

. . . British Telecommunications, Unilever, Provident Financial and 

e-Dialog . 

About AnswerTree 

AnswerTree is SPSS Inc.'s data mining tool that creates 
decision trees for profiling groups and predicting how these groups 
will respond to marketing and sales offerings. AnswerTree 1 s advanced 
analytics describe how customer or citizen groups differ. Using that 
information, the software can then predict their tendencies to respond 
one way or another to promotions or programs. AnswerTree provides users 
with the most decision tree algorithms in one tool, a visual tree for 
understanding groups better, evaluation graphs for better understanding of 
model performance and a scalable architecture for mining large data 
sets . 

About SPSS BI 



SPSS BI, a division of SPSS Inc., helps people solve business. 
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now allows these data warehouses to become highly productive 
customer information repositories." 

Darwin puts powerful data mining techniques in the hands of 
general business users and experienced analysts alike. Easy to use wizards 
automate data mining , while providing advanced users with full control 
over all options and parameters. Darwin combines advanced analytics — 
including neural networks, decision trees , and memory-based reasoning 
-- with unmatched power and price performance. The one-button model-code 
generation, powerful scripting language and robust software development kit 
bring predictive forecasting capabilities to sales, call center, 
marketing and collections organizations. Darwin runs on Sun and HP servers 
and exports data mining models in C, C++ and Java for execution within 
Oracle databases . A Windows NT release is planned for later this year. 

Oracle Corporation is the world... 
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Building on its market leadership in data analysis, SAS extends its 
mining solution to business professionals. 

Business Wire, p03170313 
March 17, 1997 

Language: English Record Type: Fulltext 
Document Type: Newswire; Trade 
Word Count: 883 

Business technologists - that is, marketing analysts and other 
business-unit decision makers who want to predict consumer behavior or 
perform other powerful analyses - currently have few options other than 
turning to quantitative experts or using single-purpose, shrink-wrapped 
software currently on the market . Such packages might allow the business 
decision maker to create a decision tree , for example. But SAS 
Institute's solution, with its visual data mining GUI for business 
users, is the first to combine micro-mining ease of use with macro-mining 
analytical depth. The product will offer a complete range of algorithms: 
decision trees , clustering, neural networks, data mining regression, 
and associations. Users will be able, in an automated fashion, to compare 
different modeling. . . 
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now allows these data warehouses to become highly productive 
customer information repositories." 

Darwin puts powerful data mining techniques in the hands of 
general business users and experienced analysts alike. Easy to use wizards 
automate data mining , while providing advanced users with full control 
over all options and parameters. Darwin combines advanced analytics - 
including neural networks, decision trees , and memory-based reasoning - 
with unmatched power and price performance. The one-button model-code 
generation, powerful scripting language and robust software development kit 
bring predictive forecasting capabilities to sales, call centre, 
marketing and collections organisations. Darwin runs on Sun and HP servers 
and exports data mining models in C, C++ and Java for execution within 
Oracle databases . A Windows NT release is planned for later this year. 

Oracle Corporation is the world. . . 
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mining methodology, to delivering results through powerful 
presentation features. This process reveals trends, explains known 
outcomes , predicts future . outcomes , and identifies factors that can 
secure a desired effect. Generating meaningful results through data mining 

... SAS/IntrNet (TM) software, which liberally incorporates Java technology. 
In this context, the Institute's market leading analytic strengths 
complement Sun's leadership in providing Web technologies. 
About SAS Institute Now. . . 
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... data mining, AnswerTree, developed by SPSS Inc., enables users to 
easily find segments, build profiles, predict outcomes and discover 
patterns in data. Automatically producing an intuitive tree diagram, this 
multi-method classification. . . 

...those who need to identify key groups in their data, for example, credit 
risk scoring, database marketing , institutional research and crime 

analysis. AnswerTree helps database marketers build profiles of key 
customers while users involved in direct mailings can easily identify who 



22/3, K/38 (Item 1 from file: 674) 

DIALOG (R) File 67 4: Computer News Fulltext 
(c) 2004 IDG Communications. All rts . reserv. 

053165 

Mining your business 

Data mining tools automatically extract 
present results in easy-to-use formats. 

Byline: Jesus Mena 

Journal: Network World Page Number: 

Publication Date: July 15, 1996 
Word Count: 750 Line Count: 72 

Text : 

As employees accumulate information in databases , spreadsheets and 
similar software used in routine business transactions, they're archiving 
knowledge that can help a company be more effective and competitive - given 
the application of new automatic data mining technology. Data mining 

is a methodology for using software to analyze vast amounts of database 

records for the purpose of discovering patterns. Unlike database query 
programs, report generators or statistical packages, data mining tools 
perform their analyses automatically. Given a set of thousands of database 

records, data mining software searches for a pattern and rule to 

describe them. By exposing a set of records from a customer database of 
individuals who bought a product and those who did not, for example, data 
mining software can derive a set of what-if statements. These what-if 
statements come in... 

... 22% and 78%, then 65.9% of the potential clients will likely buy the 
product. Data mining technology also can be used to discover 
associations in the form of purchasing patterns. A supermarket retailer, 
for instance, might discover from its bar code database that 88% of the 
customers who buy more than $100 worth of groceries, including deli items, 
purchase expensive wine. For systems administrators, data mining tools 
can be used to find patterns in network log files. The technology is based 
on years of research in machine-learning algorithms that automate the 
process of finding predictive intelligence in large databases 
Questions that traditionally required extensive manual trial-and-error 
queries or statistical segmenting can now. . . 

... from the data. The supermarket retailer, for instance, could use the 
associations found by the data mining tool to decide how and where to 
stock and market premium wines. Smart decision making One of the key 
advantages to data mining technology is that it automates the 

extraction of knowledge from databases and presents results in usable 
business statements, without requiring guesswork or extensive expertise in 
statistics. Using data mining , any company can potentially discover 

what attributes or combination of attributes differentiate buyers and 
nonbuyers, for example. What's more, companies can also identify key 
intervals in a database relevant to classification. In the case of the 
supermarket retailer, a data mining analysis may also discover that 
time and dollar ranges as important influences on the outcome for 
targeting potential buyers of a product. Data mining technology also 
can be viewed as a simplifier. It enables the compression of a database 
with hundreds and even thousands of data fields to only a few significant 
ones for predicting an outcome . By analyzing the time or dollar ranges 
in its customer database , for instance, the retailer could project when a 
customer becomes a good prospect for a specific product or service. 
Practically speaking Data mining technology freely mixes numeric, 

categorical and date variables, and is quite robust and tolerant of missing 
or noisy data. And because data mining techniques can view an entire 
database , without preconceived notions about which portions would be most 
relevant, it allows for the discovery. . . 

... factors. Rather than relying on an analyst's or a statistician's 
intuition or guesswork, data mining tools themselves, discover 

relationships. Data mining can optimize business conditions by 

providing answers on key bottom-line questions, such as the... 
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. . . conditions is trouble on the network most likely to occur? Savvy 
corporations already are using data mining technology to develop 

marketing strategies, target mailings, adjust inventories, minimize risk 
and eliminate wasteful spending. The methodology can answer business 
questions that historically have been too time-consuming to resolve or find 

predictive information that was once overlooked because it resided 
outside of traditional business expectations. A variety of data mining 
tools are available, including Knowledge Seeker from Angoss Software, 
Profiler from Attar Software and Clementine... 

...Integral Solutions, Ltd. The tools are specifically designed to discover 
significant relationships among variables in databases , and most generate 
rules and decision trees . They run the gamut from small, stand-alone 
tools that cost less than $2,000... 

...with price tags in excess of $50,000. Mena is a principal at IceBreaker, 
a data mining services consulting firm in Alameda, Calif., and on the 
Internet at http://www. icemfg.com... 
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Business technologists that is, marketing analysts and other 
business-unit decision makers who want to predict consumer behavior or 
perform other powerful analyses currently have few options other than 
turning to quantitative experts or using single-purpose, shrink-wrapped 
software currently on the market . Such packages might allow the business 
decision maker to create a decision tree , for example. But SAS 
Institute's solution, with its visual data . mining GUI for business 
users, is the first to combine micro-mining ease of use with macro-mining 
analytical depth. The product will offer a complete range of algorithms: 
decision trees , clustering, neural networks, data mining regression, 
and associations. Users will be able, in an automated fashion, to compare 
different modeling... 
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Document Type: Newsletter; Trade 
Word Count: 214 

NEW YORK, USA- Business Objects has developed a major new open 
standards data mining initiative which it has launched with a number of 
business partners to deliver and integrate... 

. . . comarketing partnerships to work together to provide software 
integration between Businessobjects 4 and the partners' data mining 
solutions. The new partners are: ANGOSS Software, makers of Knowledge SEEKER 

a knowledge discovery and data mining product that provides 
analytical and predictive capabilities; DataMind , makers of DataMind 

a family of data mining software designed specifically for business 



professionals; IBM, providers of the IBM Intelligent Miner a knowledge 
discovery product for analysing, extracting and visualising data in 
databases and data warehouses ; I soft, makers of AC -- a data mining 
product for creating decision trees ; Right Information Systems, makers 
of 4Tune a modeling and forecasting product for business people 
without specific statistical skills; Silicon Graphics, providers of MINESET 

data mining and data visualisation software and SPSS, a provider 
of statistical analysis software used by professional data analysts for 
applications including survey research and marketing and sales analysis. 
COPYRIGHT 1996 M2 Communications 
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M2 Presswire, pN/A 
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Language: English Record Type: Fulltext 
Document Type: Newswire; Trade 
Word Count: 720 

... and to make predictions using that information. 

ISoft' s AC and Alice are high profile data mining products for 
exploring databases through interactive decision trees and creating 
queries, reports, charts, and even rules for predictive models. Since its 
first release in 1990, AC has emerged in the European market as a highly 
successful data mining tool on the Unix and PC platforms. Available in 
1996, Alice introduces major breakthroughs in terms of user-friendliness 
for data mining on the PC . 

"We are pleased to be partnering with ISoft, and to offer our... 
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... reports the associated P-value. 

ANGOSS KnowledgeSEEKER is a leading artificial intelligence data 
analysis and prediction tool that offers a unique solution for business 
analysis and decision support applications, ranging from database 
marketing to forecasting and work process control. KnowedgeSEEKER 
delivers critical decision support based on operational data by exposing. . . 

...effect relationships. Results are delivered extremely rapidly an in the 
form of easy-to-grasp decision trees . 

ANGOSS Software is a publicly traded international company (Alberta 
Stock Exchange traded under the symbol... 
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WORD COUNT: 5973 

...TEXT: mainly read-only access; 

* limited, but indeed controlled, degree of up-to-dateness. 

In the Data - Warehouse approach (shown in Figure 2), data are extracted 
from the component databases and integrated in the data warehouse in 
an off-line fashion. Of course, this makes updates a problematic task; 
however, read-only access is granted, with a great transparency and 
flexibility. The applications supported by a data - warehouse are 
typically oriented to decision support (for marketing, sales, financial 
analysis), investigation, and summarization. This architecture has 
attracted a great interest in the marketplace ( OLAP , data cube, and 
multidimensional database technologies (Chaudhuri and Dayal, 1997)). 
Figure 2 : 

Figure 3: 

Multidatabases vs. Data-Warehouses 
We mention. . . 
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...TEXT: of Sunnyvale, California, appears to have found a way around the 
downloading problem. The company markets online analytical 

processing (OLAP) database ' products for mission-critical analysis of 
actual and projected enterprise performance data. OLAP software allows for 
dynamic, multidimensional, what-if analysis. Such capabilities are not 
provided by current database software based on the relational model. 
Arbor's main product, Essbase Analysis Server (Essbase) , is a 
multidimensional database that is extremely fast and efficient because of 
its ability to define the organization's reporting structure in the 

database . Basically, Essbase allows the company to paint a picture of 
what it wants. 

Essbase commonly... 
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(USE FORMAT 7 FOR FULLTEXT) 
TEXT: 

...organization 1 s existing investment in relational technology -- 
eliminating the need for expensive and proprietary multidimensional data 
base management systems. By eliminating the need for these specialized 
data structures, the OLAP++ solution can reduce the cost and complexity of 
data warehousing projects requiring multidimensional analysis of data. 
"As the world's leading provider of business analysis... 

...support systems," said Barrett Joyner, SAS Institute' s vice president 
of North American sales and marketing . " OLAP ++ is an extension of our 
existing OLAP offering and we are excited to make it... 

...libraries can obtain and analyze data from various sources without a 
need for a separate data base for OLAP. As a result, the models are 
dynamic and automatically pass the most recent... 

...these tools, enabling the benefits of multidimensional analysis without 
the necessary cost of a separate data base . This offering simply makes 
it easier to make use of SAS business intelligence facilities in... 

...present their data within an applications development environment. 
Capabilities within the SAS System include EIS, data warehousing , 
client/server computing, database access, applications development, 
graphics, data storage and analysis, report writing, quality improvement, 
project management, computer. . . 
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processes. Business objects structure the business domain model into 
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marketing strategy supported with the results of the analysis. 5 Refs. 

Descriptors: *Dat a mining ; Database systems; Marketing ; Data 
reduction; Strategic planning; Software engineering; Algorithms 

Identifiers: Data selection; Demographic algorithms; Data sets; 
Marketing strategy 

Classification Codes: 

723.2 (Data Processing); 723.3 (Database Systems); 911.4 (Marketing); 
912.2 (Management); 723.1 (Computer Programming) 

723 (Computer Software, Data Handling & Applications) ; 911 (Cost & 
Value Engineering; Industrial Economics); 912 (Industrial Engineering & 
Management ) 

72 (COMPUTERS & DATA PROCESSING); 91 (ENGINEERING MANAGEMENT) 

20/5/11 (Item 1 from file: 99) 

DIALOG (R) File 99: Wilson Appl . Sci & Tech Abs 
(c) 2004 The HW Wilson Co. All rts. reserv. 

1229398 H.W. WILSON RECORD NUMBER: BAST95024971 

OLAP answers tough business questions 
The, Lee; 

Datamation v. 41 (May 1 '95) p. 65-7 

DOCUMENT TYPE: Feature Article ISSN: 0011-6963 LANGUAGE: English 
RECORD STATUS: New record 

ABSTRACT: On Line Analytical Processing ( OLAP ) servers help 
users get the information they need from their databases to make 
business decisions and free IS professionals from spending a lot of time 
generating reports. According to Howard Dresner, research director at the 
Gartner Group, users need OLAP tools if they spend more than 20 percent 



of their time analyzing data and the data are compared across more than 
two dimensions (such as business units , geographical areas, products, 
industries, market segments , and distribution channels) . OLAP tools 
also make it easier for users to do analyses that cross departmental and 
corporate boundaries. The way in which OLAP servers work is discussed, 
and information on several OLAP tools is provided. 



DESCRIPTORS: Database design; File servers; 
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25/5/1 (Item 1 from file: 2) 

DIALOG (R) File 2 : INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts. reserv. 
6210429 

Title: Increasing customer value by integrating data mining and 
campaign management software 

Author(s): Frawley, A.; Thearling, K . 

Author Affiliation: Exchange Applications, Boston, MA, USA 
Journal:- Direct Marketing vol.61, no. 10 p. 49-53 
Publisher: Hoke Communications, 

Publication Date: Feb. 1999 Country of Publication: USA 

CODEN: DIMADI ISSN: 0012-3188 

SICI : 0012-3188 (199902) 61: 10L. 4 9: ICVI; 1-J 

Material Identity Number: B756-1999-003 

Language: English Document Type: Journal Paper (JP) 

Treatment: Practical (P) 

Abstract: To be successful, database marketers must, first, identify 
market segments containing customers or 'prospects with high profit 

potential and, second, build and execute campaigns that favorably impact 
the behavior of these individuals. The first task, identifying market 

segments , requires significant data about prospective customers and their 
buying behaviors. In theory, the more data the better. In practice, 
however, massive data stores often impede marketers, who struggle to sift 
through the minutiae to find the nuggets of valuable information. Recently, 
marketers have added a new class of software to their targeting arsenal- 
data mining applications. These software applications automate the 
process of searching the mountains of data to find patterns that are good 
predictors of purchasing behaviors . After mining the data , marketers 
must feed the results into campaign management software that, as the name 
implies, manages the campaign directed at the defined market segments . 
In the past, the link between data mining and campaign management 
software was mostly manual. In the worst cases, it involved "sneaker net", 
creating a physical file on tape or disk, which someone then carried to 
another computer, where they loaded it into the marketing database . 
This separation of the data mining and campaign management software 
introduces considerable inefficiency and opens the door for human errors. 
Tightly integrating the two disciplines presents an opportunity for 
companies to gain competitive advantage. (0 Refs) 
Subfile: D 

Descriptors: data mining ; integrated software; marketing; very large 
databases 

Identifiers: customer value; data mining software; campaign 
management software; database marketers; market segment 

identification; prospective customers; buying behavior; massive data stores 
; automated data searching; pattern finding; competitive advantage 

Class Codes: D2140 {Marketing, retailing and distribution); D2080 ( 
Information services and database systems) 

Copyright 1999, IEE 
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DIALOG (R) File 2 : INSPEC 

(c) 2004 Institution of Electrical Engineers. All rts. reserv. 

8017614 INSPEC Abstract Number: C2004 -08-11 60-070 

Title: A genetic algorithm -based approach for building accurate decision 
trees 

Author (s): Zhiwei Fu; Golden, B.L.; Lele, S.; Raghavan, S.; Wasil, E . A. 
Author Affiliation: Fannie Mae, Washington, DC, USA 
Journal: INFORMS Journal on Computing vol.15, no . 1 p. 3-22 
Publisher: INFORMS, 

Publication Date: Winter 2003 Country of Publication: USA 

CODEN: OJCOE3 ISSN: 0899-1499 

SICI: 08 99-14 99(200324) 15 : 1L . 3 : GABA; 1-1 

Material Identity Number: F156-2003-001 

U.S. Copyright Clearance Center Code: 08 99-14 99/03/1501/0003$05 . 00 
Language: English Document Type: Journal Paper (JP) 
Treatment: Practical (P) 

Abstract: In dealing with a very large data set, it might be impractical 
to construct a decision tree using all of the points. Even when it is 
possible, this might not be the best way to utilize the data . As an 
alternative, subsets of the original data set can be extracted, a tree 
can be constructed on each subset, and then parts of individual trees can 
be combined in a smart way to produce an improved final set of feasible 
trees or a final tree. In this paper, we take trees generated by a 
commercial decision tree package, namely, C4.5, and allow them to 

crossover and mutate (using a genetic algorithm) for a number of 
generations in order to yield trees of better quality. We conduct a 
computational study of our approach using a real-life marketing data set. 
In this study, we divide the data set into training, scoring, and test 
sets, and find that our approach produces uniformly "high-quality decision 

trees . In addition, we investigate the impact of scaling and demonstrate 
that our approach can be used effectively on very large data sets. (30 
Refs) 

Subfile: C 

Descriptors: decision trees ; genetic algorithms; very large 
databases 

Identifiers: genetic algorithm; decision tree ; C4 . 5 decision tree 

package; data sets 

Class Codes: C1160 (Combinatorial mathematics); C1180 (Optimisation 
techniques); C6160Z (Other DBMS) 

Copyright 2004, IEE 
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SI 


9862 DATABASE? OR DATA ( ) BASE? ? OR DATA (2N) (WAREHOUS? OR WAR- 
E()HOUS? OR MINE? ? OR MINING?) OR DATAMIN? OR DB OR DBS OR D- 
ATABANK? OR DATA () BANK? OR DATAFILE? ? OR DATA () FILE? ? OR R- 
DBMS OR RDB OR RDBM OR OODB OR 0()0()D()B OR R()D()B()M 


S2 


34 3 (MARKET OR BUSINESS) (2N) (SEGMENT? OR SECTION? OR GROUP? - 
OR PORTION? OR CLUSTER?) 


S3 


575 (SPLIT???? OR DIVIDE? OR DIVISION? OR SUBSET? ? OR SUB ( ) SE- 
T? ? OR SEGREGATE? OR SEPARATE? OR LIST??? OR ITEMIZE) (3N) - 
(VARIABLE? OR MARKET? OR CATEGORY OR CATEGORIES OR CHARACTERI- 
STIC? OR DIMENSION? OR FEATURE?) 


S4 


209 (SPLIT???? OR DIVIDE? OR DIVISION? OR SUBSET? ? OR SUB()SE- 
T? ? OR SEGREGATE? OR SEPARATE? OR LIST??? OR ITEMIZE) (3N)*- 
(SEGMENT? OR SECTION? OR GROUP? OR PORTION? OR CLUSTER?) 


S5 


47 DECISION () TREE? ? 


S6 


31380 DATA OR RECORD? ? OR STAT OR STATS OR STATISTICS OR INFORM- 
ATION? 


S7 


288 OPAL OR (ONLINE OR ON () LINE) () ANALYTICAL () PROCESS? OR OLAP 


S8 


64 SI AND S2 


S9 


4 S8 AND S3 


S10 


0 S8 AND S4 


Sll 


3 S8 AND S7 


S12 


0 S5 AND S8 


S13 


12 S5 AND MARKET? 
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DIALOG (R) File 256 : Teclnf oSource 

(c) 2004 Info. Sources Inc. All rts. reserv. 

00116246 DOCUMENT TYPE: Review 

PRODUCT NAMES: NovaView 1.0 (736317) 

TITLE: OLAP for Dummies 
AUTHOR: Schumacher, Robin 

SOURCE: Intelligent Enterprise, v2 n6 p52(2) Apr 20, 1999 
ISSN: 1524-3621 

HOMEPAGE : http : //www. intelligententerprise . com 

RECORD TYPE: Review 
REVIEW TYPE: Review 
GRADE: A 

Cognos 1 NovaView 1.0, an online analytical processing ( OLAP ) system 
suitable for OLAP beginners, allows users to view information at the 
global and corporate levels, to filter data, and drill down to various 
levels of the information. Data can involve geographical territories, 
product groups , and business units. NovaView, an OLAP interface, is 
included in the Microsoft SQL Server 7.0 engine. Testers of the NovaView 
client found that Microsoft's OLAP Services is the OLAP engine, which 
means it will be pre-installed with the related NT service in operation. 
Installation of the .NovaView was easy and quick. NovaView first presents a 
demonstration application with which users can experiment. Information is 
shown in cross-tab formats and graphs, and internal views show data from 
multi-dimensional cubes. Users can position and manipulate data by creating 
an application that acts as a container for business units or areas of 
interest for analysis. NovaView has a rich feature set and a selection of 
OLAP tools that would be adequate for most decision makers. Using NovaView 
proved to be easy and intuitive, and security features allow users to lock 
views. Another sophisticated feature is drill-through, which can be 
deployed with Microsoft Visual Basic or Visual C++. 

PRICE: $395 

COMPANY NAME: Cognos Inc (0272 94) 
SPECIAL FEATURE: Screen Layouts Charts 

DESCRIPTORS: C++; Database Management; Database Servers; Decision 
Support Systems; IBM PC & Compatibles; Information Retrieval; SQL; 
Visual Basic; Windows NT/2000 

REVISION DATE: 20030228 
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DIALOG (R) File 256 : Teclnf oSource 

(c) 2004 Info. Sources Inc. All rts. reserv. 

00116914 DOCUMENT TYPE: Review 

PRODUCT NAMES: Darwin (613932); MindSet (754471); Enterprise Miner 
(669318) 

TITLE: Mining Your Business 

AUTHOR: Deck, Stewart 

SOURCE: Computerworld, v33 n20 p94(4) May 17, 1999 
ISSN: 0010-4841 

HOMEPAGE : http : / /www. computerworld . com 

RECORD TYPE: Review 

REVIEW TYPE: Product Analysis 

GRADE: Product Analysis, No Rating 

Thinking Machines 1 Darwin, Silicon Graphics' MindSet, SAS Institute's SAS 
Enterprise Miner, and SPSS' segmentation, decision tree , regression 
analysis, and neural modeling tools are highlighted in a description of 
Fingerhut's, Axios Data Analysis Systems', and Vermont Country Store's (and 
other firms') use of data mining software. The users respectively create 
specialized catalogs and optimize mailings; analyze data warehouses to 
provide enlightenment from data; and to learn more about customers in order 
to increase catalog mailings and sales. These users mine their own data, 
and have found multiple ways to ensure success. For example, granular, 
clean data is important, as are knowing the tools needed and having a 
small, expert staff that handles model building. Fingerhut uses SAS' and 
SPSS' tools with an IBM DB2 RDBMS to create a segmentation model and a 
mailstream optimization model; the latter shows which customers are likely 
to buy products in existing catalog mailings. Health care analyst Axios 
uses Darwin and MindSet, which respectively provide a wide range of mining 
models that are easily integrated with an automated system and ported to 
Java; and allow data visualization. For Vermont Country Store, Enterprise 
Miner provides regression, neural network, and decision tree analysis 
for buying patterns and responses. The software allows Vermont Country 
Store to home in on seasonal shopping trends and particular product 
categories that target particular customers. 

COMPANY NAME: Oracle Corp (010740); Silicon Graphics Inc (435201); SAS 

Institute Inc (016021) 
SPECIAL FEATURE: Charts Tables 

DESCRIPTORS: Catalogs; Data Mining; Data Warehouses; Decision Support 

Systems; Information Retrieval; Internet Marketing ; Market Research 
; Regression Analysis; Retailers 

REVISION DATE: 20021130 
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00114 415 DOCUMENT TYPE: Review 

PRODUCT NAMES: CART (739758) 

TITLE: Salford Systems and Fleet: Understanding Customer Characteristics 

AUTHOR: Staff 

SOURCE: PC AI , vl3 nl p39(2) Jan/Feb 1999 
ISSN: 0894-0711 

HOMEPAGE : http : / /www . pcai . com/pcai 

RECORD TYPE: Review 

REVIEW TYPE: Product Analysis 



f. 

-GRADE : Product Analysis, No Rating 

CART from Salford Systems is a data-mining application that can assist 
banks and other financial institutions in gathering information about 
banking customers and creating finely tuned product and service promotions. 
Using a decision tree to display data results, users can easily 
understand the interactions among variables. Historical customer data is 
collected first with CART, then in the same environment, users can create 
models made from 'massaged' customer information that is merged into 
datasets and output as standard text files. These text files can be fed 
into various modeling tools using the CART interface to create 
logistic-regression models for illustrating a bank's overall customer 
landscape. CART ultimately provides banks with models of their best 
customers by predicting the expected balance they will eventually carry. 

COMPANY NAME: Salford Systems (65957 6) 
SPECIAL FEATURE: Screen Layouts 

DESCRIPTORS: Artificial Intelligence; Banks; Data Mining; Decision Support 

Systems; Financial Institutions; Market Research; Sales Analysis 
REVISION DATE: 19990430 
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